<?xml version="1.0" encoding="UTF-8"?><article article-type="normal" xml:lang="en">
   <front>
      <journal-meta>
         <journal-id journal-id-type="publisher-id">PALEVO</journal-id>
         <issn>1631-0683</issn>
         <publisher>
            <publisher-name>Elsevier</publisher-name>
         </publisher>
      </journal-meta>
      <article-meta>
         <article-id pub-id-type="pii">S1631-0683(07)00109-1</article-id>
         <article-id pub-id-type="doi">10.1016/j.crpv.2007.09.013</article-id>
         <article-categories>
            <subj-group subj-group-type="type">
               <subject>Research article</subject>
            </subj-group>
            <subj-group subj-group-type="heading">
               <subject>Systematic palaeontology (Palaeobotany)</subject>
            </subj-group>
         </article-categories>
         <title-group>
            <article-title>Three-item analysis: Hierarchical representation and treatment of missing and inapplicable data</article-title>
            <trans-title-group xml:lang="fr">
               <trans-title>Analyse à trois éléments : représentation hiérarchique et traitement des données manquantes et non applicables</trans-title>
            </trans-title-group>
         </title-group>
         <contrib-group content-type="editors">
            <contrib contrib-type="editor">
               <name>
                  <surname>Ricqlès</surname>
                  <given-names>Jean Broutin, Armand de</given-names>
               </name>
               <email/>
            </contrib>
         </contrib-group>
         <contrib-group content-type="authors">
            <contrib contrib-type="author" corresp="yes">
               <name>
                  <surname>Zaragüeta-Bagils</surname>
                  <given-names>René</given-names>
               </name>
               <email>rzb@ccr.jussieu.fr</email>
               <xref rid="aff1" ref-type="aff">
                  <sup>a</sup>
               </xref>
            </contrib>
            <contrib contrib-type="author">
               <name>
                  <surname>Bourdon</surname>
                  <given-names>Estelle</given-names>
               </name>
               <email>bourdon@mnhn.fr</email>
               <xref rid="aff2" ref-type="aff">
                  <sup>b</sup>
               </xref>
               <xref rid="aff3" ref-type="aff">
                  <sup>c</sup>
               </xref>
            </contrib>
            <aff-alternatives id="aff1">
               <aff>
                  <label>a</label> Laboratoire “Informatique et systématique”, UMR 5143 CNRS, université Pierre-et-Marie-Curie (Paris-6), 12, rue Cuvier, 75005 Paris, France</aff>
            </aff-alternatives>
            <aff-alternatives id="aff2">
               <aff>
                  <label>b</label> UMR 7179 CNRS, université Pierre-et-Marie-Curie (Paris-6), 4, place Jussieu, 75005 Paris cedex 05, France</aff>
            </aff-alternatives>
            <aff-alternatives id="aff3">
               <aff>
                  <label>c</label> Collège de France, 11, place Marcelin-Berthelot, 75231 Paris cedex 05, France</aff>
            </aff-alternatives>
         </contrib-group>
         <pub-date-not-available/>
         <volume>6</volume>
         <issue seq="17">6-7</issue>
         <issue-id pub-id-type="pii">S1631-0683(07)X0038-1</issue-id>
         <issue-title>La paléobotanique et l'évolution du monde végétal : quelques problèmes d'actualité</issue-title>
         <issue-title xml:lang="en">Palaeobotany and evolution of the plant's world: some current problems</issue-title>
         <fpage seq="0" content-type="normal">527</fpage>
         <lpage content-type="normal">534</lpage>
         <history>
            <date date-type="received" iso-8601-date="2007-05-25"/>
            <date date-type="accepted" iso-8601-date="2007-09-20"/>
         </history>
         <permissions>
            <copyright-statement>© 2007 Académie des sciences. Published by Elsevier B.V. All rights reserved.</copyright-statement>
            <copyright-year>2007</copyright-year>
            <copyright-holder>Académie des sciences</copyright-holder>
         </permissions>
         <self-uri xmlns:xlink="http://www.w3.org/1999/xlink" content-type="application/pdf" xlink:href="main.pdf">
                        Full (PDF)
                    </self-uri>
         <abstract abstract-type="author">
            <p>Matrix-based methods, including parsimony programs, represent and treat missing data in the same way, such that every character-state is possible, and inapplicable data, such that every character state is impossible. This is because the hierarchical nature of homology assessments cannot be represented in taxon–character matrices. We show that the hierarchical representation of hypotheses of homology used in three-item analysis permits the accurate treatment of missing and inapplicable data.</p>
         </abstract>
         <trans-abstract abstract-type="author" xml:lang="fr">
            <p>Les méthodes qui utilisent des matrices, dont les programmes de parcimonie, représentent et traitent de la même manière les données manquantes, pour lesquelles tout état de caractère est possible, et les données non applicables, pour lesquelles tout état de caractère est impossible. Ce problème vient du fait que la nature hiérarchique des hypothèses d’homologie ne peut pas être représentée dans une matrice taxons–caractères. Nous montrons que la représentation hiérarchique des hypothèses d’homologie utilisée en analyse à trois éléments permet de traiter correctement les données manquantes et non applicables.</p>
         </trans-abstract>
         <kwd-group>
            <unstructured-kwd-group>Missing data, Inapplicable data, Matrix-based methods, Taxon–character matrices, Hierarchical representation of hypotheses of homology, Three-item analysis</unstructured-kwd-group>
         </kwd-group>
         <kwd-group xml:lang="fr">
            <unstructured-kwd-group>Données manquantes, Données non applicables, Matrices taxons–caractères, Représentation hiérarchique des hypothèses d’homologie, Analyse à trois éléments</unstructured-kwd-group>
         </kwd-group>
         <custom-meta-group>
            <custom-meta>
               <meta-name>presented</meta-name>
               <meta-value>Written on invitation of the Editorial Board</meta-value>
            </custom-meta>
         </custom-meta-group>
      </article-meta>
   </front>
   <body>
      <sec>
         <label>1</label>
         <title>Introduction</title>
         <p>Palaeontology or, more precisely, systematic studies that include fossils, are particular in two respects. The first is that fossils are the unique source of temporal information in phylogenetic studies <xref rid="bib32" ref-type="bibr">[32]</xref>. The temporal information conveyed by fossils and its measurement will be dealt with elsewhere <xref rid="bib13" ref-type="bibr">[13]</xref>. The second peculiarity of palaeontology is the intrinsic incompleteness of fossil specimens, which results in the impossibility of observing features not preserved. These features are coded with question marks in matrix-based computer programs. However, other circumstances involve the use of question marks <xref rid="bib7" ref-type="bibr">[7]</xref> and <xref rid="bib10" ref-type="bibr">[10]</xref>. These include:<list>
               <list-item>
                  <label>•</label>
                  <p>ignorance due to either the unavailability of specimens for a given taxon or to the difficulty relating two structures in some specimens as homologous;</p>
               </list-item>
               <list-item>
                  <label>•</label>
                  <p>polymorphism, i.e. taxa that show instances of more than one character state;</p>
               </list-item>
               <list-item>
                  <label>•</label>
                  <p>inapplicability of character states. This situation appears when a given character cannot be scored due to the absence of the structure in some organisms or taxa. For example, if the relationships among Embryophyta are analysed, the character “flowers” would be inapplicable in e.g., Filicopsida, as flowers are not present.</p>
               </list-item>
            </list>
         </p>
         <p>This study is concerned with true missing data, i.e. characters relating parts that are either not preserved or ignored, and inapplicable character states. In matrix-based methods, these cases are represented by question marks (generally coded as “?”, “–”, “N” or “*”). As the same representation is used, current phylogenetic methods, i.e. parsimony analysis, treat unknown and inapplicable cases in exactly the same way. The relevance of these identical representation and treatment is examined here.</p>
      </sec>
      <sec>
         <label>2</label>
         <title>Representation of hypotheses of homology</title>
         <sec>
            <p>Systematics aims to establish hypotheses of homology and combine them in order to infer relationships among organisms and taxa. These relationships, and the taxa themselves, can be seen as the same <xref rid="bib18" ref-type="bibr">[18]</xref>, homologies being parts of taxa. Homology is the relationship that links homologs <xref rid="bib2" ref-type="bibr">[2]</xref>, <xref rid="bib17" ref-type="bibr">[17]</xref>, <xref rid="bib18" ref-type="bibr">[18]</xref> and <xref rid="bib31" ref-type="bibr">[31]</xref>. A homolog is, as defined by Owen <xref rid="bib31" ref-type="bibr">[31]</xref>, “the same organ in different animals or plants under every variety of form and function”. It is clear from Owen's definition that a homolog is a class of objects. Homology is, formally, a relationship among classes. Homology and character, on the one side, and homolog and character state, on the other side, can be regarded as synonyms <xref rid="bib2" ref-type="bibr">[2]</xref>. However, formalisation must not impose a constraint on the way systematists think. On the contrary, it is intended to express the idea of the systematist in a formal language, i.e. a language devoid of ambiguity <xref rid="bib12" ref-type="bibr">[12]</xref>.</p>
         </sec>
         <sec>
            <p>A taxonomic sample of, say, plants is illustrated in the theoretical example of <xref rid="fig1" ref-type="fig">Fig. 1</xref>. A hypothesis of primary homology <xref rid="bib24" ref-type="bibr">[24]</xref> may be expressed by the systematist as “<italic>I think that there are some starry forms that are grey, and some others that are white, among the sample. I think that these colours may be a relevant argument for grouping some of the plants of the sample</italic>.” This hypothesis is illustrated in <xref rid="fig2" ref-type="fig">Fig. 2</xref>. Some of the operational taxonomic units (OTUs) are grouped into homologues, which correspond to the concepts defined by the colour of the starry form. The white starry form of a given OTU is a concrete object and constitutes an instance of the concept “white starry form”. The systematist has no theory of relationships concerning organisms with a black or dotted starry form, which are excluded from the homologues.</p>
         </sec>
      </sec>
      <sec>
         <label>3</label>
         <title>Representation vs. coding</title>
         <sec>
            <p>The problem of generalising a series of observations into hypotheses of homology is often considered a matter of coding. Systematists usually speak of coding characters into a matrix, and think about the most relevant way to code missing characters (e.g., <xref rid="bib27" ref-type="bibr">[27]</xref>). Kitching et al. <xref rid="bib10" ref-type="bibr">[10]</xref> define coding as “the conversion of original observations into a discrete alphanumeric format suitable for cladistic analysis”. Thus, coding refers to the simple transcription of some information by changing the usual language into the conventional signs of a code. However, there is a stage apparently absent in Kitching et al.'s definition and in other theoretical works dealing with the subject <xref rid="bib1" ref-type="bibr">[1]</xref> and <xref rid="bib6" ref-type="bibr">[6]</xref>. This stage concerns the representation of the knowledge that a systematist extracts from a series of observations. A representation is, in the use given here, “the act of making sensible an absent object or a concept via an image, a figure or a sign” (definition translated from the Petit Robert™ French dictionary). Systematists express hypotheses of homology, not series of observations. A hypothesis of homology is the representation of a concept that relates different objects as being the same <xref rid="bib19" ref-type="bibr">[19]</xref>, <xref rid="bib26" ref-type="bibr">[26]</xref> and <xref rid="bib31" ref-type="bibr">[31]</xref>. Only after this stage, a character and its character states are coded using alphanumeric symbols. Coding is thus limited to the choice of symbols used to fill data matrices. Incidentally, “data matrices” are not matrices, but tables; moreover, they do not contain data, i.e. statements about unique objects, but concepts representing the “sameness” of different objects. In a “data matrix”, homologues are usually coded as a number. In the case illustrated here, OTUs with a grey starry part might be given the code “1”. The confusion between coding and representing and, in general, between objects and concepts, is probably at the source of a number of mistakes and misunderstandings in systematics and other disciplines <xref rid="bib5" ref-type="bibr">[5]</xref>.</p>
         </sec>
      </sec>
      <sec>
         <label>4</label>
         <title>Missing data</title>
         <sec>
            <p>The problem of the inclusion of fossil taxa has been discussed <xref rid="bib23" ref-type="bibr">[23]</xref> and there exists wide agreement about the importance of including fossils in phylogenetic studies in spite of their incompleteness. Homology statements cannot be assessed if the parts of organisms concerned in the homology relationship are unknown <xref rid="bib9" ref-type="bibr">[9]</xref>. This has largely been seen as a source of problems <xref rid="bib1" ref-type="bibr">[1]</xref>, <xref rid="bib5" ref-type="bibr">[5]</xref>, <xref rid="bib6" ref-type="bibr">[6]</xref>, <xref rid="bib19" ref-type="bibr">[19]</xref>, <xref rid="bib24" ref-type="bibr">[24]</xref>, <xref rid="bib26" ref-type="bibr">[26]</xref> and <xref rid="bib27" ref-type="bibr">[27]</xref>, mainly because large amounts of missing data tend to increase the number of most parsimonious trees found, often decreasing the resolution in consensus trees. New algorithms may only obscure the procedure in order to hide misrepresentations, however, and remain unable to propose solutions when the relevant information is not there. As Strauss et al. <xref rid="bib29" ref-type="bibr">[29]</xref> write, “despite the ample literature on missing-value estimation, there is still little empirical guidance for researchers”, and the situation will remain, because if empirical evidence is not available, empirical guidance cannot be given in historical biology.</p>
         </sec>
         <sec>
            <p>
               <xref rid="fig3" ref-type="fig">Fig. 3</xref> shows a specimen, which lacks its upper part. It is impossible to know if there was a starry part in the living organism. Consequently, the question of its assignment to a homolog is pointless. However, the incomplete specimen cannot be excluded from any of the classes of the hypotheses of homology that concern the starry part. If a more complete specimen was found, it might fall under any of the concepts of the hierarchy, in Frege's terminology <xref rid="bib5" ref-type="bibr">[5]</xref>. Thus, missing features must be regarded as potentially belonging to all states because they cannot be excluded from any of them. In other terms, missing data represent the idea that <italic>every state is possible</italic>.</p>
         </sec>
      </sec>
      <sec>
         <label>5</label>
         <title>Inapplicable data</title>
         <sec>
            <p>Inapplicable character states appear in matrices when some character states are sub-concepts of a more inclusive state, called a super-concept <xref rid="bib12" ref-type="bibr">[12]</xref> (<xref rid="fig4" ref-type="fig">Fig. 4</xref>). The hypothesis of primary homology that the systematist wishes to represent may be expressed as “I think that the presence of a starry form is a relevant argument for grouping some of the organisms of my sample. Among them, and only among them, some may be grouped together because they share a grey starry form, while others may be grouped because they share a white starry form.” It has been shown that matrices, i.e. tables, cannot represent this kind of hierarchical information <xref rid="bib2" ref-type="bibr">[2]</xref>. In order to bypass this inadequacy, systematists try to represent something similar using a complicated procedure (<xref rid="tbl1" ref-type="table">Table 1</xref>). The hierarchical hypothesis of primary homology is broken up into series of independent binary characters (<xref rid="fig4" ref-type="fig">Fig. 4</xref>). The first one is intended to represent the presence of a starry form. The second and third characters represent the presence of a starry form of a particular colour. Question marks occur when organisms which lack a starry part have to be represented for the characters that require the presence of a starry part. In this case, the systematist is certain that both the character state “grey starry form” and the character state “white starry form” are absent for organisms devoid of starry part. In other words, inapplicable character states correspond to the case for which <italic>every state is impossible</italic>.</p>
         </sec>
      </sec>
      <sec>
         <label>6</label>
         <title>Representation of missing and inapplicable data in matrix-based methods</title>
         <sec>
            <p>Matrix-based programs in general, and parsimony in particular, represent and code in the same way missing data, for which <italic>every state is possible</italic>, and inapplicable character states, for which <italic>every state is impossible</italic>. In addition, matrix-based methods apply the same treatment in order to minimise steps, no matter what the source of a question mark is. Some authors have suggested that a solution to this problem cannot be found until new algorithms are available <xref rid="bib8" ref-type="bibr">[8]</xref>, <xref rid="bib14" ref-type="bibr">[14]</xref> and <xref rid="bib15" ref-type="bibr">[15]</xref>. However, the problem does not concern better algorithms, but the relevance of representation of hypotheses of homology in standard parsimony.</p>
         </sec>
         <sec>
            <p>Lee and Bryant <xref rid="bib14" ref-type="bibr">[14]</xref> have accurately identified the core problem of the defective representation of inapplicable states in matrix-based methods, e.g., parsimony analysis, even if the idea had already been suggested before <xref rid="bib8" ref-type="bibr">[8]</xref> and <xref rid="bib26" ref-type="bibr">[26]</xref>. They affirm that inapplicable character states entail “a character hierarchy”. However, they do not manage to find a relevant solution, because their rationale is based on a transformational viewpoint. After Lee and Bryant, “the recognition of a character with two or more character states implies that transformations have occurred between those states […] The state that becomes a synapomorphy is determined by rooting the tree” (p. 374). On the other hand, Lee and Bryant state in the same page that “in cladistic analysis it is assumed that organismic diversity forms a nested taxic hierarchy of groups within groups. This hierarchy is inferred empirically using characters, which form a complementary nested hierarchy.”</p>
         </sec>
         <sec>
            <p>A hierarchy is a classificatory structure, i.e. a collection of non-empty classes such as every individual belongs to at least one class. Moreover, a hierarchy is defined by the following properties: a class called root contains all the individuals; there is a class that contains a single individual, for every individual, called singleton (the OTU of systematists); most importantly, the intersection between any two classes of the hierarchy is either empty or one of the classes <xref rid="bib4" ref-type="bibr">[4]</xref>. There is an isomorphism, i.e. an equivalence, between a hierarchy and a rooted tree <xref rid="bib12" ref-type="bibr">[12]</xref>.</p>
         </sec>
         <sec>
            <p>Transformation processes, however, do not produce hierarchical relationships. This is why, in a cladistic context, taxa are not supposed to transform into one another; they differentiate from more general to more particular, i.e. they show hierarchical relationships. Lee and Bryant clearly show in their <xref rid="fig2" ref-type="fig">Fig. 2</xref> that the representation expressing their idea of character is a rooted tree (see <xref rid="fig5" ref-type="fig">Fig. 5</xref>A), or its equivalent hierarchy (<xref rid="fig5" ref-type="fig">Fig. 5</xref>B). It has been shown that matrices cannot represent hierarchies without distorting hypotheses of homology <xref rid="bib2" ref-type="bibr">[2]</xref>. The only way to correctly represent hypotheses of homology is to have at one's disposal a method that <italic>understands</italic> the hierarchical nature of homology assessments. The only method that permits hierarchies is three-item analysis.</p>
         </sec>
      </sec>
      <sec>
         <label>7</label>
         <title>Representation of missing and inapplicable data in three-item analysis</title>
         <sec>
            <p>Cladistic analysis is not called analysis by chance. The analytical method has roots very deep in time <xref rid="bib3" ref-type="bibr">[3]</xref>. In his Discours de la méthode pour bien conduire sa raison et chercher la vérité dans les sciences, plus la Dioptrique, les Météores et la Géométrie qui sont des essais de cette méthode, René Descartes gives the methodological principles that constitute the base of the analytical reasoning. His second principle states that, in front of a complex problem, one has “to divide each of the difficulties under examination into as many parts as possible, and as might be necessary for its adequate solution”; the third principle states that “to conduct my thoughts in such order that, by commencing with objects the simplest and easiest to know, I might ascend by little and little, and, as it were, step by step, to the knowledge of the more complex.” The analytical method can be reformulated as follows: prior to solving a complex problem, one must decompose it into a set of simpler problems, each admitting of solution, and combine these partial solutions in order to obtain a general solution.</p>
         </sec>
         <sec>
            <p>Finding relationships of a set of taxa is the complex problem that has to be solved. Analysis is performed in order to decompose this global problem into a suite of characters or primary hypotheses of homology <xref rid="bib2" ref-type="bibr">[2]</xref>. This means that the parts that compose taxic relationships are the characters <xref rid="bib17" ref-type="bibr">[17]</xref> and <xref rid="bib18" ref-type="bibr">[18]</xref>. One of the main constraints of the analytical method is that, because the whole, i.e. relationships among taxa, results from the combination of the partial solutions, i.e. secondary homologies, nothing can be learnt about the latter from the former. However, Lee and Bryant claim that “the state that becomes a synapomorphy is determined by rooting the tree.” This assertion violates the analytical principle, because hierarchical relationships among character states are established from the relationships among taxa, which are themselves built from the relationships among character states.</p>
         </sec>
         <sec>
            <p>This circularity in Lee and Bryant's reasoning is not their only misunderstanding. In parsimony and other matrix-based methods, character states define neither hierarchies, nor other formal kind of classificatory structure. This is due to the existence of reversals considered as synapomorphies. In this case, the same state is regarded as plesiomorphous in a part of the tree and apomorphous in another. This leads to a non-sense in hierarchical terms. As an analogy, France is a class inside Europe characterised by French people; France itself contains Paris, a class characterised by Parisians; the class “Paris” would then include a class characterised by French people again!</p>
         </sec>
         <sec>
            <p>Three-item analysis (3ia) is a phylogenetic method that uses hierarchical hypotheses of homology <xref rid="bib20" ref-type="bibr">[20]</xref> and <xref rid="bib21" ref-type="bibr">[21]</xref> and thus respects analytical principles. In <xref rid="fig6" ref-type="fig">Fig. 6</xref>, the conjecture of the existence of a clade is based on the presence of a starry part. No hypothesis is made concerning organisms devoid of a starry part. Concerning the OTU that lacks its upper part, the presence of a starry part cannot be set aside, nor can its absence.</p>
         </sec>
         <sec>
            <p>In order to maximize analysis, each hypothesis of primary homology is decomposed into three-item statements (3is), which correspond to the minimum hierarchical information possible <xref rid="bib21" ref-type="bibr">[21]</xref>. The character shown in <xref rid="fig6" ref-type="fig">Fig. 6</xref> produces two 3is, which are illustrated in <xref rid="fig7" ref-type="fig">Fig. 7</xref>. Note that the OTU which lacks its upper part is not included in the resulting 3is: missing data are just treated as missing in three-item analysis.</p>
         </sec>
         <sec>
            <p>
               <xref rid="fig8" ref-type="fig">Fig. 8</xref> shows the 3is deduced from the character represented in <xref rid="fig4" ref-type="fig">Fig. 4</xref>. This character has two states, grey starry form (<xref rid="fig8" ref-type="fig">Fig. 8</xref>B) and white starry form (<xref rid="fig8" ref-type="fig">Fig. 8</xref>C) that are dependent on the presence of a starry form (<xref rid="fig8" ref-type="fig">Fig. 8</xref>A). Nelson and Ladiges <xref rid="bib22" ref-type="bibr">[22]</xref> showed that logical redundancy among 3is can be treated with fractional weighting. The value of the fractional weight (FW) is expressed by the formula: 2/(number of terminals included in the state). In <xref rid="fig8" ref-type="fig">Fig. 8</xref>A, the 3is deduced from the state “presence of a starry form” bear a fractional weight of ½, because only half of the 3is are necessary to deduce the others. 3ia, as illustrated in <xref rid="fig8" ref-type="fig">Fig. 8</xref>, distinguishes a multi state character from a series of independent binary characters through the removal of redundancy: the four 3is of the most inclusive state that are also deduced from the two less inclusive states are eliminated. The final fractional weight of the 3is deduced from the state “presence of a starry form” is thus 3/4.</p>
         </sec>
      </sec>
      <sec>
         <label>8</label>
         <title>Criticisms</title>
         <sec>
            <p>3ia has been criticised because of the addition of question marks <xref rid="bib11" ref-type="bibr">[11]</xref>. Kluge <xref rid="bib11" ref-type="bibr">[11]</xref> writes: “The matrix resulting from the three-taxon transformation has considerable missing data, where none existed before. This amounts to the unscientific exercise of throwing away observations”. As recently stated, a computer program for 3ia is being developed that uses no matrix <xref rid="bib2" ref-type="bibr">[2]</xref>. Moreover, it has been shown that 3ia finds the correct answer even if a matrix is used (e.g., <xref rid="bib16" ref-type="bibr">[16]</xref> and <xref rid="bib25" ref-type="bibr">[25]</xref>). The reason is that parsimony optimises missing data as if it was potentially any of the states. In equivalent compatibility terms <xref rid="bib30" ref-type="bibr">[30]</xref>, questions marks are compatible with any state. They lack any empirical content because they do not forbid anything <xref rid="bib28" ref-type="bibr">[28]</xref>. On the other hand, parsimony creates question marks were none existed before, so as to deal with the hierarchical dependence among character states of a multistate character. Parsimony then treats these artificial question marks as missing data. However, no missing data were present in the original hypothesis, as specified by Lee and Bryant <xref rid="bib14" ref-type="bibr">[14]</xref>. Moreover, inapplicable data have a very rich empirical content, because they forbid the assignment to any of the dependent states. We conclude that parsimony (or, in general, matrix-based methods) creates ambiguity and distortion of hypotheses where none existed before, as opposed to Kluge's statement.</p>
         </sec>
      </sec>
   </body>
   <back>
      <ack>
         <title>Acknowledgments</title>
         <p>G. Dubus drew our attention to the impossibility of applying the descriptive model he developed to phylogenetic matrices. We are grateful to D. Williams for useful comments on the manuscript. We also thank an anonymous reviewer.</p>
      </ack>
      <ref-list>
         <ref id="bib1">
            <label>[1]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Brower</surname>
                  <given-names>A.V.Z.</given-names>
               </name>
               <name>
                  <surname>Schawaroch</surname>
                  <given-names>V.</given-names>
               </name>
               <article-title>Three steps of homology assessement</article-title>
               <source>Cladistics</source>
               <volume>12</volume>
               <year>1996</year>
               <page-range>265–272</page-range>
            </element-citation>
         </ref>
         <ref id="bib2">
            <label>[2]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Cao</surname>
                  <given-names>N.</given-names>
               </name>
               <name>
                  <surname>Zaragüeta Bagils</surname>
                  <given-names>R.</given-names>
               </name>
               <name>
                  <surname>Vignes-Lebbe</surname>
                  <given-names>R.</given-names>
               </name>
               <article-title>Hierarchical representation of hypotheses of homology</article-title>
               <source>Geodiversitas</source>
               <volume>29</volume>
               <year>2007</year>
               <page-range>5–15</page-range>
            </element-citation>
         </ref>
         <ref id="bib3">
            <label>[3]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Descartes</surname>
                  <given-names>R.</given-names>
               </name>
               <source>Discours de la méthode pour bien conduire sa raison et chercher la vérité dans les sciences, plus la dioptrique les météores et la géométrie qui sont des essais de cette méthode</source>
               <year>1637</year>
               <publisher-name>Imprimerie de I. Maire</publisher-name>
               <publisher-loc>Leyde</publisher-loc>
            </element-citation>
         </ref>
         <ref id="bib4">
            <label>[4]</label>
            <mixed-citation>E. Diday, Une représentation visuelle des classes empiétantes : les pyramides, Rapport de recherche, INRIA, Rocquencourt, 1984, p. 75.</mixed-citation>
         </ref>
         <ref id="bib5">
            <label>[5]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Frege</surname>
                  <given-names>G.</given-names>
               </name>
               <source>Écrits Logiques et Philosophiques</source>
               <year>1971</year>
               <publisher-name>Éditions du Seuil</publisher-name>
               <publisher-loc>Paris</publisher-loc>
            </element-citation>
         </ref>
         <ref id="bib6">
            <label>[6]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Freudenstein</surname>
                  <given-names>J.V.</given-names>
               </name>
               <article-title>Characters, states, and homology</article-title>
               <source>Syst. Biol.</source>
               <volume>54</volume>
               <year>2006</year>
               <page-range>965–973</page-range>
            </element-citation>
         </ref>
         <ref id="bib7">
            <label>[7]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Grande</surname>
                  <given-names>L.</given-names>
               </name>
               <name>
                  <surname>Bemis</surname>
                  <given-names>W.E.</given-names>
               </name>
               <article-title>A comprehensive phylogenetic study of amiid fishes (Amiidae) based on comparative skeletal anatomy. An empirical search for interconnected patterns of natural history, Society of Vertebrate Paleontology Memoir 4</article-title>
               <source>J. Vertebr. Paleontol.</source>
               <volume>18</volume>
               <issue>Suppl.</issue>
               <year>1998</year>
               <page-range>1–690</page-range>
            </element-citation>
         </ref>
         <ref id="bib8">
            <label>[8]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Hawkins</surname>
                  <given-names>J.A.</given-names>
               </name>
               <name>
                  <surname>Hughes</surname>
                  <given-names>C.E.</given-names>
               </name>
               <name>
                  <surname>Scotland</surname>
                  <given-names>R.W.</given-names>
               </name>
               <article-title>Primary homology assessment, characters and character states</article-title>
               <source>Cladistics</source>
               <volume>13</volume>
               <year>1997</year>
               <page-range>275–283</page-range>
            </element-citation>
         </ref>
         <ref id="bib9">
            <label>[9]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Kearney</surname>
                  <given-names>M.</given-names>
               </name>
               <name>
                  <surname>Clark</surname>
                  <given-names>J.M.</given-names>
               </name>
               <article-title>Problems due to missing data in phylogenetic analyses including fossils: a critical review</article-title>
               <source>J. Vertebr. Palaeontol.</source>
               <volume>23</volume>
               <year>2003</year>
               <page-range>263–274</page-range>
            </element-citation>
         </ref>
         <ref id="bib10">
            <label>[10]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Kitching</surname>
                  <given-names>I.J.</given-names>
               </name>
               <name>
                  <surname>Forey</surname>
                  <given-names>P.L.</given-names>
               </name>
               <name>
                  <surname>Humphries</surname>
                  <given-names>C.J.</given-names>
               </name>
               <name>
                  <surname>Williams</surname>
                  <given-names>D.W.</given-names>
               </name>
               <source>Cladistics: The Theory and Practice of Parsimony Analysis</source>
               <edition>second edn</edition>
               <year>1998</year>
               <publisher-name>Oxford University Press</publisher-name>
               <publisher-loc>Oxford, UK</publisher-loc>
            </element-citation>
         </ref>
         <ref id="bib11">
            <label>[11]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Kluge</surname>
                  <given-names>A.G.</given-names>
               </name>
               <article-title>Three-taxon transformation in phylogenetic inference: ambiguity and distortion as regards explanatory power</article-title>
               <source>Cladistics</source>
               <volume>9</volume>
               <year>1993</year>
               <page-range>246–259</page-range>
            </element-citation>
         </ref>
         <ref id="bib12">
            <label>[12]</label>
            <mixed-citation>J. Lebbe, Représentation des concepts en biologie et médecine. Introduction à l’analyse des connaissances et à l’identification assistée par ordinateur, thèse, université Paris-6, 1991, p. 281.</mixed-citation>
         </ref>
         <ref id="bib13">
            <label>[13]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Lelièvre</surname>
                  <given-names>H.</given-names>
               </name>
               <name>
                  <surname>Zaragüeta-Bagils</surname>
                  <given-names>R.</given-names>
               </name>
               <name>
                  <surname>Rouget</surname>
                  <given-names>I.</given-names>
               </name>
               <article-title>Temporal information, fossil record and phylogeny</article-title>
               <source>C. R. Palevol.</source>
               <volume>6–7</volume>
               <year>2007</year>
            </element-citation>
         </ref>
         <ref id="bib14">
            <label>[14]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Lee</surname>
                  <given-names>D.-C.</given-names>
               </name>
               <name>
                  <surname>Bryant</surname>
                  <given-names>H.N.</given-names>
               </name>
               <article-title>A reconsideration of the coding of inapplicable characters: assumptions and problems</article-title>
               <source>Cladistics</source>
               <volume>15</volume>
               <year>1999</year>
               <page-range>373–378</page-range>
            </element-citation>
         </ref>
         <ref id="bib15">
            <label>[15]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Maddison</surname>
                  <given-names>D.R.</given-names>
               </name>
               <article-title>Missing data versus missing characters in phylogenetic analysis</article-title>
               <source>Syst. Biol.</source>
               <volume>42</volume>
               <year>1993</year>
               <page-range>576–581</page-range>
            </element-citation>
         </ref>
         <ref id="bib16">
            <label>[16]</label>
            <mixed-citation>G.J. Nelson, Reply, Cladistics 9 (1993) 261–265.</mixed-citation>
         </ref>
         <ref id="bib17">
            <label>[17]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Nelson</surname>
                  <given-names>G.J.</given-names>
               </name>
               <source>La systématique et l’homologie</source>
               <name>
                  <surname>Tassy</surname>
                  <given-names>P.</given-names>
               </name>
               <name>
                  <surname>Lelièvre</surname>
                  <given-names>H.</given-names>
               </name>
               <article-title>Caractères</article-title>
               <year>1994</year>
               <publisher-name>Société française de systématique</publisher-name>
               <publisher-loc>Paris</publisher-loc>
               <page-range>5–28</page-range>
            </element-citation>
         </ref>
         <ref id="bib18">
            <label>[18]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Nelson</surname>
                  <given-names>G.J.</given-names>
               </name>
               <source>Homology and Systematics</source>
               <name>
                  <surname>Hall</surname>
                  <given-names>B.K.</given-names>
               </name>
               <article-title>The Hierarchical Basis of Comparative Biology</article-title>
               <year>1994</year>
               <publisher-name>Academic Press</publisher-name>
               <publisher-loc>San Diego</publisher-loc>
               <page-range>101–149</page-range>
            </element-citation>
         </ref>
         <ref id="bib19">
            <label>[19]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Nelson</surname>
                  <given-names>G.J.</given-names>
               </name>
               <name>
                  <surname>Platnick</surname>
                  <given-names>N.I.</given-names>
               </name>
               <source>Systematics and Biogeography: Cladistics and Vicariance</source>
               <year>1981</year>
               <publisher-name>Columbia University Press</publisher-name>
               <publisher-loc>New York</publisher-loc>
            </element-citation>
         </ref>
         <ref id="bib20">
            <label>[20]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Nelson</surname>
                  <given-names>G.J.</given-names>
               </name>
               <name>
                  <surname>Ladiges</surname>
                  <given-names>P.Y.</given-names>
               </name>
               <article-title>Three-area statements: standard assumptions for biogeographic analysis</article-title>
               <source>Syst. Zool.</source>
               <volume>40</volume>
               <year>1991</year>
               <page-range>470–485</page-range>
            </element-citation>
         </ref>
         <ref id="bib21">
            <label>[21]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Nelson</surname>
                  <given-names>G.J.</given-names>
               </name>
               <name>
                  <surname>Platnick</surname>
                  <given-names>N.I.</given-names>
               </name>
               <article-title>Three-taxon statements: a more precise use of parsimony?</article-title>
               <source>Cladistics</source>
               <volume>7</volume>
               <year>1991</year>
               <page-range>351–366</page-range>
            </element-citation>
         </ref>
         <ref id="bib22">
            <label>[22]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Nelson</surname>
                  <given-names>G.J.</given-names>
               </name>
               <name>
                  <surname>Ladiges</surname>
                  <given-names>P.-Y.</given-names>
               </name>
               <article-title>Information content and fractional weight of three-item statements</article-title>
               <source>Syst. Biol.</source>
               <volume>41</volume>
               <year>1992</year>
               <page-range>490–494</page-range>
            </element-citation>
         </ref>
         <ref id="bib23">
            <label>[23]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Patterson</surname>
                  <given-names>C.</given-names>
               </name>
               <article-title>Significance of fossils in determining evolutionary relationships</article-title>
               <source>Annu. Rev. Ecol. Syst.</source>
               <volume>12</volume>
               <year>1981</year>
               <page-range>195–223</page-range>
            </element-citation>
         </ref>
         <ref id="bib24">
            <label>[24]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>de Pinna</surname>
                  <given-names>M.C.C.</given-names>
               </name>
               <article-title>Concepts and tests of homology in the cladistic paradigm</article-title>
               <source>Cladistics</source>
               <volume>7</volume>
               <year>1991</year>
               <page-range>367–394</page-range>
            </element-citation>
         </ref>
         <ref id="bib25">
            <label>[25]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Platnick</surname>
                  <given-names>N.I.</given-names>
               </name>
               <article-title>Character optimization and weighting: differences between the standard and three-taxon approaches to phylogenetic inference</article-title>
               <source>Cladistics</source>
               <volume>9</volume>
               <year>1993</year>
               <page-range>267–272</page-range>
            </element-citation>
         </ref>
         <ref id="bib26">
            <label>[26]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Platnick</surname>
                  <given-names>N.I.</given-names>
               </name>
               <article-title>Philosophy and the transformation of cladistics</article-title>
               <source>Syst. Zool.</source>
               <volume>28</volume>
               <year>1979</year>
               <page-range>537–546</page-range>
            </element-citation>
         </ref>
         <ref id="bib27">
            <label>[27]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Pleijel</surname>
                  <given-names>F.</given-names>
               </name>
               <article-title>On character coding for phylogeny reconstruction</article-title>
               <source>Cladistics</source>
               <volume>11</volume>
               <year>1995</year>
               <page-range>309–315</page-range>
            </element-citation>
         </ref>
         <ref id="bib28">
            <label>[28]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Popper</surname>
                  <given-names>K.</given-names>
               </name>
               <source>Conjectures et Réfutations : la croissance du savoir scientifique</source>
               <year>1972</year>
               <publisher-name>Payot</publisher-name>
               <publisher-loc>Paris</publisher-loc>
            </element-citation>
         </ref>
         <ref id="bib29">
            <label>[29]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Strauss</surname>
                  <given-names>R.E.</given-names>
               </name>
               <name>
                  <surname>Atanasov</surname>
                  <given-names>M.N.</given-names>
               </name>
               <name>
                  <surname>Alves de Oliveira</surname>
                  <given-names>J.</given-names>
               </name>
               <article-title>Evaluation of the principal-component and expectation-maximization methods for estimating missing data in morphometrics studies</article-title>
               <source>J. Vertebr. Paleontol.</source>
               <volume>23</volume>
               <year>2003</year>
               <page-range>284–296</page-range>
            </element-citation>
         </ref>
         <ref id="bib30">
            <label>[30]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Wilkinson</surname>
                  <given-names>M.</given-names>
               </name>
               <article-title>Three-taxon statements: when is a parsimony analysis also a clique analysis?</article-title>
               <source>Cladistics</source>
               <volume>10</volume>
               <year>1994</year>
               <page-range>221–223</page-range>
            </element-citation>
         </ref>
         <ref id="bib31">
            <label>[31]</label>
            <element-citation publication-type="book">
               <name>
                  <surname>Williams</surname>
                  <given-names>D.W.</given-names>
               </name>
               <source>Homologues and homology, phenetics and cladistics: 150 years of progress</source>
               <name>
                  <surname>Williams</surname>
                  <given-names>D.W.</given-names>
               </name>
               <name>
                  <surname>Forey</surname>
                  <given-names>P.L.</given-names>
               </name>
               <article-title>Milestones in Systematics</article-title>
               <year>2004</year>
               <publisher-name>CRC – The Systematics Association</publisher-name>
               <publisher-loc>London</publisher-loc>
               <page-range>191–224</page-range>
            </element-citation>
         </ref>
         <ref id="bib32">
            <label>[32]</label>
            <element-citation publication-type="article">
               <name>
                  <surname>Zaragüeta Bagils</surname>
                  <given-names>R.</given-names>
               </name>
               <name>
                  <surname>Lelièvre</surname>
                  <given-names>H.</given-names>
               </name>
               <name>
                  <surname>Tassy</surname>
                  <given-names>P.</given-names>
               </name>
               <article-title>Temporal paralogy, cladograms, and the quality of the fossil record</article-title>
               <source>Geodiversitas</source>
               <volume>26</volume>
               <year>2004</year>
               <page-range>381–389</page-range>
            </element-citation>
         </ref>
      </ref-list>
   </back>
   <floats-group>
      <fig id="fig1">
         <label>Fig. 1</label>
         <caption>
            <p>Theoretical taxonomic sampling including terminal taxa or operational taxonomic units (OTUs). Hypotheses of primary homology concern grouping OTUs into classes, called here homologues, and defining relationships between these classes.</p>
            <p>Fig. 1. Échantillonnage taxonomique théorique incluant des taxons terminaux ou unités taxonomiques opérationnelles (UTOs). Les hypothèses d’homologie primaire consistent à grouper les UTOs dans des classes appelées ici homologues, et à définir les relations entre ces classes.</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr1.jpg"/>
      </fig>
      <fig id="fig2">
         <label>Fig. 2</label>
         <caption>
            <p>The hypothesis of primary homology corresponds to the theory of the systematist. The formal classificatory structure that best represents this hypothesis is the hierarchy. Here, the theory is that the colour of the starry form is a relevant argument for grouping some organisms among the sample.</p>
            <p>Fig. 2. L’hypothèse d’homologie primaire constitue la théorie du systématicien. La structure classificatoire formelle qui représente le mieux cette hypothèse est la hiérarchie. Ici, la théorie est que la couleur de la forme étoilée est un argument pertinent pour grouper certains organismes au sein de l’échantillon.</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr2.jpg"/>
      </fig>
      <fig id="fig3">
         <label>Fig. 3</label>
         <caption>
            <p>An organism which lacks its upper part is considered here. It is impossible to assign it either to one of the homologs or to the root, i.e. the class containing all the specimens assigned to the homologs plus the specimens not belonging to any homolog. However, we cannot exclude that this organism belongs to any of the classes of the hierarchy.</p>
            <p>Fig. 3. Un organisme avec la partie supérieure manquante est considéré ici. Il est impossible de l’attribuer à l’un des homologues ou à la racine, c’est-à-dire la classe contenant tous les spécimens attribués aux homologues ainsi que les spécimens n’appartenant à aucun homologue. Cependant, on ne peut exclure que cet organisme appartienne à l’une des classes de la hiérarchie.</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr3.jpg"/>
      </fig>
      <fig id="fig4">
         <label>Fig. 4</label>
         <caption>
            <p>Inapplicable data appear when some character states depend on more inclusive ones. In this example, the states “grey” or “white” depend on the state “starry part”.</p>
            <p>Fig. 4. Les données non applicables apparaissent lorsque des états de caractère sont dépendants d’états de caractère plus inclusifs. Dans cet exemple, les états de caractère « couleur grise » ou « couleur blanche » dépendent de l’état de caractère « partie étoilée ».</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr4.jpg"/>
      </fig>
      <fig id="fig5">
         <label>Fig. 5</label>
         <caption>
            <p>. (<bold>A</bold>) Hierarchical relationships between “a part and its character variables”, following Lee and Bryant <xref rid="bib14" ref-type="bibr">[14]</xref>, <xref rid="fig2" ref-type="fig">Fig. 2</xref>, modified; this representation corresponds to a rooted tree. (<bold>B</bold>) Hierarchical representation isomorphic of the tree shown in (<bold>A</bold>). The character is the hierarchy and the character states are the classes of the hierarchy <xref rid="bib2" ref-type="bibr">[2]</xref>.</p>
            <p>Fig. 5. (<bold>A</bold>) Relations hiérarchiques entre « une partie et ses variables de caractère », selon Lee et Bryant <xref rid="bib14" ref-type="bibr">[14]</xref>, <xref rid="fig2" ref-type="fig">Fig. 2</xref>, modifiée ; cette représentation correspond à un arbre raciné. (<bold>B</bold>) Représentation hiérarchique isomorphe de l’arbre montré en (<bold>A</bold>). Le caractère correspond à la hiérarchie et les états de caractère constituent les classes de la hiérarchie <xref rid="bib2" ref-type="bibr">[2]</xref>.</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr5.jpg"/>
      </fig>
      <fig id="fig6">
         <label>Fig. 6</label>
         <caption>
            <p>Hierarchical representation of a hypothesis of primary homology in three-item analysis.</p>
            <p>Fig. 6. Représentation hiérarchique d’une hypothèse d’homologie primaire en analyse à trois éléments.</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr6.jpg"/>
      </fig>
      <fig id="fig7">
         <label>Fig. 7</label>
         <caption>
            <p>Three-item statements (3is) resulting from the character shown in <xref rid="fig6" ref-type="fig">Fig. 6</xref>.</p>
            <p>Fig. 7. Assertions à trois éléments (3is) résultant du caractère montré en <xref rid="fig6" ref-type="fig">Fig. 6</xref>.</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr7.jpg"/>
      </fig>
      <fig id="fig8">
         <label>Fig. 8</label>
         <caption>
            <p>3is deduced from the character shown in <xref rid="fig4" ref-type="fig">Fig. 4</xref>. (<bold>A</bold>) 3is deduced from the state “presence of a starry form”. (<bold>B</bold>) 3is deduced from the state “grey starry form”. (<bold>C</bold>) 3is deduced from the state “white starry form”. Dependence among character states permits to remove part of the logical redundancy in the most inclusive state.</p>
            <p>Fig. 8. 3is déduits du caractère montré dans la <xref rid="fig4" ref-type="fig">Fig. 4</xref>. (<bold>A</bold>) 3is déduits de l’état de caractère « présence de partie étoilée ». (<bold>B</bold>) 3is déduits de l’état de caractère « partie étoilée grise ». (<bold>C</bold>) 3is déduits de l’état de caractère « partie étoilée blanche ». La dépendance entre états de caractère permet d’éliminer une partie de la redondance logique au niveau de l’état le plus inclusif.</p>
         </caption>
         <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="main.assets/gr8.jpg"/>
      </fig>
      <table-wrap id="tbl1">
         <label>Table 1</label>
         <caption>
            <p>Matrix representation of a hypothesis of homology including a hierarchical structure of character states</p>
            <p>Tableau 1 Représentation en matrice d’une hypothèse d’homologie incluant une structure hiérarchique d’états de caractère</p>
         </caption>
         <oasis:table xmlns:oasis="http://www.niso.org/standards/z39-96/ns/oasis-exchange/table"/>
      </table-wrap>
   </floats-group>
</article>